<html>
<head>
<title>Finding memory bugs in Win32 applications with Valgrind</title>
</head>
<body>
<h1>Finding memory bugs in Win32 applications with Valgrind</h1>
DRAFT<br>
<b>Dan Kegel<br>
November 2009<br>
Google</b>

<h2>Dynamic Analysis</h2>
Writing good software is hard.  
Programmers must always be on the lookout for bugs.  
One of the 
<a href="http://en.wikipedia.org/wiki/Software_engineering">many techniques</a>
developed to help programmers do this is called
<a href="http://en.wikipedia.org/wiki/Dynamic_program_analysis">dynamic analysis</a>.
In dynamic analysis, programs are run in a special environment
that watches for flaws such as invalid pointers, undefined values, 
or race conditions.
<p>
The classic dynamic analysis tool is 
<a href="http://en.wikipedia.org/wiki/IBM_Rational_Purify">Purify</a>.
It's good, but expensive.   The free software world developed a similar tool,
<a href="http://valgrind.org">Valgrind</a>, which is also good, but 
<a href="http://tml-blog.blogspot.com/2009/07/porting-valgrind-to-windows.html">the windows port is not yet ready</a>.
<p>
However, Linux can run Windows apps via the 
<a href="http://winehq.org">Wine</a> 
compatibility layer, and Valgrind is compatible with Wine.
This article investigates whether Valgrind+wine can actually replace 
some of the uses of Purify.

<h2>Tools</h2>

Valgrind needs a patch or two to work well with Wine;
<a href="http://wiki.winehq.org/Valgrind">http://wiki.winehq.org/Valgrind</a>
describes how to build it.
<p>
Wine needs to be built after installing Valgrind so it can use
Valgrind's macros to decorate its heap operations -- and
the /usr/include/valgrind/valgrind.h file must be new
enough to contain the lines
<pre>
          /* Wine support */
          VG_USERREQ__LOAD_PDB_DEBUGINFO = 0x1601
</pre>
This happens automatically if you install valgrind with --prefix /usr,
but since I prefer to install it into /usr/local/valgrind-NNNN, 
I usually do 
<pre>
          cd /usr/include; sudo mv valgrind valgrind.orig; sudo ln -s /usr/local/valgrind-NNNN/include/valgrind valgrind
</pre>
<p>
You can check that this was done properly by verifying
that wine's include/config.h sets HAVE_VALGRIND_MEMCHECK_H,
and that valgrind gives good symbolic stacks for
errors in win32 code.
<p>
Once you have Wine installed, run Wine's conformance tests 
to verify that is operating properly.  (A few test failures
are ok; see 
<a href="http://test.winehq.org">http://test.winehq.org</a>
for current test results from other users.)

<p>
Once both Valgrind and Wine were installed and working, 
I ran Wine's conformance tests under Valgrind
as described at 
<a href="http://wiki.winehq.org/Valgrind">http://wiki.winehq.org/Valgrind</a>
to see what kind of background warnings to expect from
wine and the operating system under Valgrind.
This takes about five hours.  You can see my results at
<a href="http://kegel.com/wine/valgrind/logs/">http://kegel.com/wine/valgrind/logs/</a>
for comparison.   
<p>
I put together a suppression file for uninteresting warnings at
<a href="http://winezeug.googlecode.com/svn/trunk/valgrind/valgrind-suppressions">http://winezeug.googlecode.com/svn/trunk/valgrind/valgrind-suppressions</a>
and filed bugs against Wine for a number
of these problems; see 
<a href="http://bugs.winehq.org/buglist.cgi?quicksearch=valgrind">http://bugs.winehq.org/buglist.cgi?quicksearch=valgrind</a>
.
<p>

With the tools more or less working properly, I moved on
to trying them out on a real win32 project.

<h2>Chromium, Valgrind, and Win32</h2>
When the Chromium project needed to add dynamic analysis to its continuous
build and test rig, it used Purify on Win32, and Valgrind on Linux and Mac.
To investigate whether Chromium could use Valgrind for Win32, I wrote
two scripts: one to extract the tests, and one to run them.

<h3>Building and Extracting Tests</h3>
Here are the steps I follow to bring up the test environment:
<ol>
<li>Set up Chromium build environment<br>
<a href="http://dev.chromium.org">http://dev.chromium.org</a>
describes how to build Chromium on Windows.
It takes about a day to install Visual Studio and the needed
SDKs and hotfixes, and sadly, there is no easy way to automate this.
<li>Do a debug build of the Chrome solution<br>
(This takes about 500 minutes on a 1GB Core 2 Duo, 80 minutes on the same machin with 4GB RAM,
and about 20 minutes on an 8 core box with 12 GB of RAM and two hard drives.)
<li>Run the tests on Windows with the cygwin shell script
<a href="http://winezeug.googlecode.com/svn/trunk/testsuites/chromium/chromium-runtests.sh">http://winezeug.googlecode.com/svn/trunk/testsuites/chromium/chromium-runtests.sh</a>
.
The results are placed in the logs subdirectory.
<li>Check the results of the script<br>
If any tests misbehave, add lines to the list of expected failures in the script
with the following tags:
<ul>
<li>dontcare-fail-win:  tests that fail even on windows
<li>dontcare-hang-win:  tests that hang even on windows
<li>dontcare-flaky-win: tests that fail sometimes even on windows
</ul>
(You can see the list with "sh chromium-runtests.sh --list-failures".)
<li>Package the tests <br>
Run
<a href="http://winezeug.googlecode.com/svn/trunk/testsuites/chromium/chromium-savetests.sh">http://winezeug.googlecode.com/svn/trunk/testsuites/chromium/chromium-savetests.sh</a>
.
This saves a sizeable subset of Chromium's test suite 
(roughly everything but the layout and ui tests) together
with .pdb files into an archive file.
(I have a prebuilt copy at
<a href="http://kegel.com/wine/chromium/chromium-tests.tar.bz2">http://kegel.com/wine/chromium/chromium-tests.tar.bz2</a>
for people who don't want to set up a chromium build environment.)
<li>Verify the archived tests work properly on Windows<br>
Unpack that archive in a different directory of the windows machine,
run the tests again, and verify that no tests fail.  (If they
do, either there's a flaky test that needs to be fixed or disabled,
or the script that archives the tests is buggy and needs fixing.)
<li>Verify the archived tests work properly on Linux<br>
Copy the archive to the Linux system where valgrind and wine are installed.
(Because that script directly references the valgrind suppressions file mentioned
above, it's easiest to grab a copy of the winezeug tree with the
command "svn checkout http://winezeug.googlecode.com/svn/trunk/ winezeug",
then cd to winezeug/testcases/chromium and unpack the tarball there.)
<br>
Then run "chromium-runtests.sh" again to see how the tests did
on Wine (without Valgrind).

<li>Check the results of the script<br>
Triage any misbehaving tests by adding lines to the list of expected failures with the following
tags:
<ul>
<li>dontcare:  failing tests that don't seem useful or practical to fix on Wine
<li>fail:  tests that fail on wine
<li>fail-wine-vmware:  tests that fail on wine under vmware because of a vmware problem
<li>hang:  tests that hang on wine
<li>crash:  tests that crash on wine
<li>flaky:  tests that fail sometimes on wine
</ul>
If any problems turn out to be bugs in either the test scripts
or in the Chromium testcases, fix those (or file a chromium bug).
If any seem to be Wine's fault, file bugs against Wine for those,
and add URLs for the Wine bug reports to the tags; 
"sh chromium-runtests.sh --list-failures-html" outputs the 
<a href="http://kegel.com/wine/chromium/failures.html">expected failure list in HTML with hyperlinks to the bug reports</a>.
<li>
As the Wine community fixes problems, remove the corresponding lines from the list of expected failures.
</ol>

<h3>Using the tools</h3>
Finally, with the tools installed and tests validated,
I then ran the tests under valgrind with the command
"sh chromium-runtests.sh --valgrind".  It takes about two hours
to run this subset of tests under Valgrind on a Core 2 Duo --
unless there's a crash; handling crashes takes up about
two gigabytes of RAM and many minutes (the pdb reading code
of both wine and valgrind is relatively inefficient).
<p>
The Chromium team runs Valgrind on the Linux and Mac tests continuously.
When Valgrind finds a flaw, the developer dealing with it either fixes
it immediately, or files a bug, then either adds a suppression
to one of the valgrind suppression files, or disables the test
(in the case of a crash).  The goal is to keep the tree green
so that any new failures stand out vividly.
<p>
The same strategy is followed when running Valgrind on the
Windows tests; either fix the problems (if you can), or
file bugs and suppress the error or disable the test,
such that all the tests pass under Valgrind without any
reported warnings.  Then work can continue, and the problems
recorded in the suppressions files and expected failure list
can be fixed offline.

<h3>Issue: memcheck unit tests not ported</h3>
Valgrind's memcheck has a nice set of unit tests, but it's
hard to run them on valgrind+wine to see whether
valgrind works as it should there.
<p>
The test suite should be ported to Visual C++
and we should use it to find and fix problems in
the valgrind+wine combination.  If it doesn't have
tests for C++, those need to be added.

<h3>Issue: detecting overruns</h3>
Valgrind replaces the linux heap with one that puts guard bytes before and after
each allocation, and marks those guard bytes as inaccessible.
At the moment, nothing does this when running win32 apps
under wine+valgrind; wine does inform Valgrind about the
allocations made by the NT heap, but it doesn't put any
guard bytes around them.
<p>
So, how can we achieve this on Wine?  The Wine philosophy
is to look for how Windows likes to do things, and sure enough,
there are ways to tell the Windows kernel to do similar things.
http://technet.microsoft.com/en-us/library/cc736347(WS.10).aspx
describes the gflags utility, used to set registry flags that end
up controlling the NtGlobalFlags field of each processes' PEB.
<p>
It would probably suffice to implement the
FLG_HEAP_ENABLE_TAIL_CHECK bit of NtGlobalFlags in Wine,
and add a Valgrind annotation for the guard bytes.
This would improve memory error detection in Wine even
without Valgrind, too.

<h3>Issue: detecting double frees</h3>
Valgrind's replacement heap also keeps freed blocks out of
circulation for a while, marking them completely inaccessible
so that references to freed blocks are caught.
<p>
Wine's implementation of the win32 system heap should probably 
provide this feature, possibly triggered by the
FLG_HEAP_DISABLE_COALESCING NtGlobalFlags bit.

<h3>Issue: userspace heaps getting in the way</h3>
Visual C++'s C runtime library may have its own heap
implementation on top of the Win32 system heap.
Likewise, applications may override the default ::operator new
and malloc with their own heap implementations.
All of these can get in the way of the valgrind-aware
system heap described above.
<p>
Valgrind should probably intercept calls to ::operator new,
malloc, and friends, and redirect them straight to the win32
system heap (but see below).
<p>
However, it may be sufficient to run Chromium
with the system heap selected by setting
CHROME_ALLOCATOR=winheap as described in
<a href="http://dev.chromium.org/developers/page-heap-for-chrome">Page Heap for Chrome</a>.
That's next on my list of things to try.

<h3>Issue: detecting delete/delete[]/free mismatches</h3>
On Linux, Valgrind's replacement heap marks each block
with a type so mismatches between the various allocation
types can be detected.  
<p>
Valgrind should probably intercept calls to ::operator new,
malloc, and friends, and wrap them with a type label before
redirecting them to the win32 system heap.

<h3>Issue: valgrind.h doesn't compile with Visual C++</h3>
win32 apps can't currently include valgrind.h, so they
can't use Valgrind's client hooks to inform Valgrind
about functions expected to generate exceptions, etc.
<p>
valgrind.h should be ported to visual C++; see 
<a href="https://bugs.kde.org/show_bug.cgi?id=210935">valgrind bug 210935</a>.

<h3>Issue: spurious warnings from _strlen</h3>
Library routines that process strings a word at a time
seem to generate false warnings like
<pre>
 Conditional jump or move depends on uninitialised value(s)
    at 0x401086: _strlen (in /home/dank/test2.exe)
</pre>
Valgrind has custom versions of these functions for Linux.
<p>
Valgrind should probably have custom versions of these 
functions for win32 as well; see 
<a href="https://bugs.kde.org/show_bug.cgi?id=190660">valgrind bug 190660</a>.
Until then, using the debugging C runtime library may be a sufficient workaround.

<h3>Issue: Slowdown from _chkstk?</h3>
Win32 apps that have large stack variables issue a large number of calls to _chkstk(),
all? of which trigger Valgrind warnings (at least when --check-smc=all
is given).  (See <a href="http://support.microsoft.com/kb/100775">kb100775</a>.)
It may also interfere with --track-origins.
<p>
Valgrind could know about _chkstk() at a lower level than it currently does;
see <a href="http://old.nabble.com/memcheck---track-origins%3Dyes-vs-alloca-__chkstk-td18308025.html">John Reiser's note from 2008</a>.
User apps 

<h3>Issue: Incomplete pdb file support in Valgrind</h3>
Some small test programs don't get symbolic stack dumps
in Valgrind because of incomplete pdb file support.
<p>
Valgrind should copy the fix for this issue from Wine.
See <a href="https://bugs.kde.org/show_bug.cgi?id=211529">valgrind bug 211529</a>.
</body>
</html>
